-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16837 container: Add client-side DFS metrics #15544
Conversation
Ticket title is 'Add DFS-level metrics to client telemetry' |
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15544/1/testReport/ |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15544/1/testReport/ |
46adb31
to
c8db771
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15544/2/testReport/ |
c8db771
to
f2cd61b
Compare
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15544/3/execution/node/1131/log |
768a10d
to
bc4444e
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15544/6/testReport/ |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15544/7/testReport/ |
bc4444e
to
5d423d5
Compare
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15544/8/execution/node/1506/log |
5d423d5
to
1e800cb
Compare
@mchaarawi: I've removed the dependency on the new container property for now, as I'd prefer not to hold this patch up if that becomes a sticking point. Please take a look at let me know if there are any significant changes that you'd like. I can add some ftest coverage for the new DFS metrics if it looks OK. @ashleypittman: I based this patch on the stats you added to dfuse. It's a bit heavier-weight because it relies on the telemetry library instead of atomics, but maybe there's an opportunity to merge the approaches? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor issues that need to be addressed
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15544/10/execution/node/1543/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
It would be great if we can enable metrics for something like our existing ior_small.py and mdtest_small.py ci tests and do some verifications there. whether in this PR or other PR is fine with me.
cae9012
to
c270e79
Compare
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15544/12/execution/node/1211/log |
I suspect the copyright GHA is failing because the workflow is coming from a merge of this PR + master, but the source tree used is just this PR.. Something I'll need to consider in the future for new GHA. Anyway, the copyright GHA is not required so can be ignored |
c270e79
to
a5c512d
Compare
If metrics are enabled for a POSIX container, create a new container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Features: container telemetry Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
57f04a7
to
01944a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to address the consider-using-generator
pylint error.
Argh. Do you guys have a preferred solution for this? Seems like overkill compared to a list comprehension. Should I just add an ignore? |
Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look as deeply into the DFS changes. Go and telemetry changes LGTM.
Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
@mjmac Is this correct? Total
|
Yep. It's a Cumulative Histogram. That's the format that Prometheus likes. |
Interesting :) I don't think I've seen that before |
@phender: Nudge... Please advise on how you want me to address your -1 review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ftest changes.
@mchaarawi, @kjacque: Should be good to go now. |
If metrics are enabled for a POSIX container, create a new pool/$UUID/container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
If metrics are enabled for a POSIX container, create a new pool/$UUID/container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
If metrics are enabled for a POSIX container, create a new pool/$UUID/container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
If metrics are enabled for a POSIX container, create a new pool/$UUID/container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
If metrics are enabled for a POSIX container, create a new pool/$UUID/container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
If metrics are enabled for a POSIX container, create a new pool/$UUID/container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
If metrics are enabled for a POSIX container, create a new pool/$UUID/container/$UUID/dfs metrics root in the client telemetry to provide DFS-oriented metrics (POSIX ops, file I/Os, etc). Also fixes a bug in the agent code for pruning unused client telemetry segments. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
DAOS-16837 container: Add client-side DFS metrics (#15544)
If metrics are enabled for a POSIX container, create
a new pool/$UUID/container/$UUID/dfs metrics root in
the client telemetry to provide DFS-oriented metrics
(POSIX ops, file I/Os, etc).
Also fixes a bug in the agent code for pruning unused
client telemetry segments.
Features: container telemetry
Signed-off-by: Michael MacDonald [email protected]